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Reconstructing hadronically decaying top quarks is a key challenge at the LHC, affecting a long 
list of Higgs analyses and new physics searches. We propose a new method of collecting jets in 
buckets, corresponding to top quarks and initial state radiation. This method is particularly well 
suited for moderate transverse momenta of the top quark, closing the gap between top taggers and 
traditional top reconstruction. Applying it to searches for supersymmetric top squarks we illustrate 
the power of buckets. 



An important difference between the Tevatron and LHC is that the latter can produce and study top quarks 
in great numbers pQ. This allows us to investigate all different top production mechanisms in detail, including 
their QCD structure. After the discovery of a Higgs-like [2] resonance, studying its coupling to the top quark 
will play a particularly critical role in our understanding of the Higgs sector. This is made most obvious in 
the renormalization group evolution of the Higgs potential to large energy scales [3] . The direct measurement 
of the top Yukawa coupling clearly hinges on top quark identification and reconstruction. At the same time, 
we have reason to suspect that new physics that solves the hierarchy problem and lives at sufficiently high 
energy scales tends to couple strongly to top quarks This motivates us to search for new physics in the 
LHC top sample; for example by searching for top pair resonance structures of top pair production associated 
with missing transverse momentum. 

Historically, the study of top pair production has largely been restricted to semi-leptonic decays of the two 
tops quarks. The reason is that the lepton effectively removes the overwhelming QCD background. However, 
purely leptonic top pairs not only come at a much smaller rate, they also include two neutrinos, challenging 
any analysis based on the observed missing transverse momentum. A major challenge in top physics at the 
LHC is how to gain access to the purely hadronic decays of top quarks. 

Identifying hadronic top decays using a jet algorithm was part of the original proposal of jet substructure 
analyses [5J[B]. Some of the early jet substructure algorithms were designed to target hadronic top decays [7]. 
While Higgs taggers [5] should clearly have a high priority within the LHC experiments, working top taggers 
are the perfect laboratory to test how well substructure approaches work in practice. 

The moment we go beyond searches for heavy resonances the main problem of all top taggers is the size 
of the initial fat jet. For example, using the Cambridge-Aachen jet of size R = 1.5 as the starting point of 
the HEPTopTagger [5HTT] limits the momentum range of reconstructable top quarks to pr.t ^ 200 GeV. 
Essentially, all other top tagging approaches require even higher boost. Increasing the size of the fat jet to 
R = 1.8 raises several QCD and combinatorics issues [12]. The big question in using hadronic top analyses as 
part of Higgs searches or top partner searches is how to further reduce this top momentum threshold. 

In this paper we propose an alternative method for an efficient top reconstruction at moderate momentum. 
It targets the transverse momentum regime, 



in the fully hadronic decay mode. Starting from an event with a high multiplicity of jets, we assign all jets into 
three groups or 'buckets'. The buckets are chosen based on a metric in terms of invariant masses, defining two 
top buckets and a third bucket containing the extra hadronic activity like initial state radiation (ISR). While 
initially this search strategy does not prefer boosted top quarks, we will see how such events are eventually 
preferred from a combinatorics perspective. 



I. 



INTRODUCTION 



p T t = 100 - 350 GeV 



(1) 
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In Section [TT] we start with a simple algorithm for reconstructing tops in buckets. We test this algorithm 
for hadronically decaying top pairs as well as TV+jets and pure QCD jets backgrounds. Additional handles 
will help us separate the top signal from the backgrounds. In Section |III| we modify the simple algorithm to 
take advantage of the b quarks and W bosons that are present in top decays but not in the QCD backgrounds. 
This improved bucket algorithm is optimized to efficiently find and reconstruct top pairs with moderate pt- In 
Section |IV| we apply our bucket algorithm to stop pair searches. 



II. SIMPLE BUCKET ALGORITHM 



In this section, we start with a simple algorithm to identify and reconstruct hadronically decaying top pairs. 
While an improved algorithm will be presented in the next section, this simple version captures many of the key 
concepts we will employ later. The overall scheme is fairly straightforward: by assumption every jet originates 
from one of the two tops or from initial state radiation, so we assign every jet to one of three 'buckets'. Jets 
in buckets B\ and B 2 correspond to top decays, while all remaining jets are placed in -Bisr. We cycle through 
every permutation of jet assignments to minimize the distance between the invariant masses of the jets in B\ 
and B 2 and the top mass. The metric is chosen to ensure that bucket B\ reconstructs the top mass better than 
bucket B 2 . 

Here and throughout the remainder of the paper, all Standard Model (SM) samples are generated with 
Alpgen+Pythia [T31 [T3]. We use matrix- level matching [15 to correctly describe jet radiation over the entire 
phase space. This includes up to ti+2 jets, W+A jets and 3 — 5 QCD jets, with the top cross sections normalized 
to next-to-next-to-leading order [16]. Jets are reconstructed using the Cambridge/ Aachen algorithm [17] of size 
R = 0.5 in Fast Jet [TS]. Note that all our results are relatively insensitive to the choice of jet algorithm. 

All leptons we require to be hard and isolated: pr,i > 10 GeV and no track of another charged particle 
within R < 0.5 around the lepton. We consider only jets with px > 25 GeV and \ri\ < 2.5. Even though the 
algorithm presented in this section is in principle applicable to events with any number of jets we preselect 
events with five or more jets to reduce QCD backgrounds. Because we are interested in hadronically decaying 
ti pairs we veto on isolated leptons. The restricted sample denoted as thXh has a cross section of 104 pb at the 
LHC with y's = 8 TeV. One last word concerning underlying event and pile-up: unlike methods involving jet 
substructure [6] our bucket reconstruction relies on standard jets with moderately large multiplicities, so aside 
from jet energy scale uncertainties we do not expect specific experimental or theoretical challenges. 



Bucket definition 



As the goal of the bucket algorithm is to identify tops by sorting jets into categories that resemble tops, we 
need a metric to determine the similarity of a collection of jets to a top. For simple buckets Bi it is 

A Bi = \m Bi - m t \ with m|. = ^ pj \ , (2) 

where we sum over all four-vectors in the bucket. For each event with five or more jets we permute over all 
possible groupings of the jets into three buckets {Bi,B 2 , -Bisr}- We then select the combination that minimizes 
a global metric defined as 

A 2 = w A 2 Bi + A| 2 . (3) 

The factor uj > 1 stabilizes the grouping of jets into buckets. In this work we take oj = 100, effectively decoupling 
Ab 2 from the metric. As a consequence we always find A Bl < Ab 21 i.e. B\ is the bucket with an invariant 
mass closer to that of the top than the invariant mass of bucket B 2 ■ Other values of uj might eventually turn 
out more appropriate for different applications. 

As the first selection cut we require the invariant masses of both top buckets, B\ and B 2 , to lie in the window 

155 GeV < m Bl _ 2 < 200 GeV . (4) 
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FIG. 1: Normalized transverse momentum distributions for the top decay partons in the thih sample. Left: all six 
pr,i distributions. Central and right: normalized distributions of 5 th and 6 th hardest partons for events with at least 5 
jets. Different lines in the central and right panels correspond to the different generator-level cuts on the top transverse 
momenta p T ,t > 0, 100, 200 GeV. 

The lower limit selects events above the Jacobian peak for top decays. We will see that this selection improves 
the top signal over QCD background S/B by about a factor of two. All buckets passing Eq. Q we categorize by 
their number of jets; buckets including three or more jets (3j-buckets) and those including two jets (2j-buckets). 
Selecting only events with two 3j-buckets improves the signal-to-background ratio by a factor of five. 

Jet selection 

For tagging two tops in fully hadronic mode, we might naively require at least six reconstructed jets. In 
practice, with a threshold of pr.j > 25 GeV this condition is too strict. To improve our efficiency we need 
to consider the case where one of the jets from top pair decays is missing. It is also worth noting that even 
requiring six jets does not guarantee that we collect all six decay products of the top pair. Frequently, some of 
the observed jets come from initial state radiation instead 12j. 

In Figure [I] we plot the parton level pt distributions of the six decay partons from the top pairs. In the left 
panel we see that the four hardest decay jets are not affected by the threshold pxj > 25 GeV. In contrast, 
the softest distribution only peaks around 25 GeV, so roughly half the events do not pass our threshold on the 
sixth jet. 

Table [I] shows the number of events in the hadronic t^ih sample after several cuts on the jet multiplicity, and 
the percentage of events with the 5 th or 6 th parton-level top decay jets above pr,j > 25 GeV. In about a half of 
events with at least six jets the sixth top-decay parton falls below the pr threshold. Adding the two columns 
tells us that more than 90% of all events capture five of the six top decay products. Requiring only five instead 
of six jets increases the fraction of events where we miss only one of the top decay products to almost half. 
The table also shows the effect of placing a transverse momentum cut on the softer top, pr,t 2 ■ F° r a moderate 





t h t h +jets [pb] 


Pt,6 > 25 GeV 


p T ,5 > 25 GeV > p T ,a 


lepton veto 


104.1 


33.4% 


44.9% 


Tlj > 4 




94.3 


35.8% 


46.0% 


Tlj > 5 




70.5 


42.5% 


46.4% 


Tlj > 6 




36.7 


54.7% 


38.0% 


nj > 5 PT ^ 

PT,t 2 


> 100 GeV 

> 200 GeV 


32.7 
6.7 


43.6% 
47.4% 


46.2% 
44.7% 



TABLE I: Signal cross sections after requiring five or six top decay jets with pr,j > 25 GeV. The reference value are all 
hadronic top pairs after applying the lepton veto as described in the text. 
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top pt threshold our central values for the efficiencies are not strongly affected, but hadronization as well as 
detector effects might lead to significant shifts due to the steep px,j behavior. 



W reconstruction 



After placing each of the jets in the event into one of three buckets (B\, B 2 , or -Bisr) we require the 3j- 
buckets to contain a hadronically decaying W candidate. In the rare case of one bucket consisting of more than 
three jets we combine them into exactly three jets using the C/A algorithm and then look for a W candidate. 
As in the HEPTopTagger [TO] we define a mass ratio cut 



mw 



m B . 



< 0.15 



(5) 



for at least one combination of jets k,£ in the bucket i. Events with 2j-buckets by construction cannot satisfy 
Eq. ([5]). In addition, in such events one of the W decay jets is typically the softest jet and does not pass the 
Pt threshold, and so the W reconstruction could not occur regardless 



In our first, naive approach we categorize all events with two valid top buckets into three types: 

• (ti^tuj): both top buckets have W candidates as defined by Eq. (pH), 

• (t™,t_) or (t^,t w ): only the first or second top bucket has a W candidate, 

• (t_,t_): neither top bucket has a W candidate. 

The t w or t_ status is ordered as (Bi,B 2 ), where Bi is defined as the bucket closest in mass to the top. Buckets 
classified as t w have to be 3j-buckets, while t_ buckets can be either 3j or 2j. 

To extract hadronic top pair events from the QCD background we can compare the different categories on 
Monte-Carlo truth level. Starting from S/B ~ 0.005 after the lepton veto and selecting only (t w ,tw) events 
yields the highest value S/B ~ 0.09. This corresponds to an improvement of S/B by almost a factor 20. 
To improve beyond this level, we need to require at least one, preferably two, 6-tags to control the mostly 
Yang-Mills and light-flavor QCD background. 



b-tags 

To further reduce the QCD background we exploit 6-tags. We assume 6-tagging and mis-tagging efficiencies 
for light flavors (eb,e m is) to be (70%, 1%), and fully account for combinatorial factors in the background. For 
the tt+jets signal the effect of mis-tagging is sub-leading and can be ignored. 

To avoid combinatorics we could impose 6-tagging only for the most likely &-jet in a bucket based on the 
W condition factors to improve S/B, as suggested in Ref. [12]. In this algorithm we do not take this option 
because it reduces the signal efficiency. We prefer to keep the maximum fraction of signal events especially for 
the case that both signal and the main background include ti events, such as the top partner searches discussed 
below. 

In any top-tagging algorithm we are interested not only in extracting the signal from backgrounds, but in 
accurately reconstructing the original top momenta. For a measure of our reconstruction accuracy we use the 
geometric distance in the (77, (f>) plane between the bucket momentum and the closer top parton momentum p t 
obtained from Monte-Carlo truth, 

Rt = min[AJ2(.B i ,p tl ) 1 AR(Bi,p t2 )} . (6) 

We consider our reconstruction successful when Ri < 0.5. In the following, we indicate the percentage of events 
with both buckets reconstructing top momenta (R\ < 0.5 and R 2 < 0.5), events where only B\ reconstructs the 
top momentum [R\ < 0.5 < R 2 ), and events where only B 2 reconstructs the top momentum (R 2 < 0.5 < i?i). 
The last case allows for events where the second bucket (with its worse top mass reconstruction) actually gives 
the better top direction. 
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tfctfc+jets [fb] 


Ri,R 2 < 0.5 


i?i < 0.5 < R 2 


R 2 < 0.5 < Ri 


QCD [fb] 


W+jets [fb] 


S/Bqcu 


lepton veto 


104 xlO 3 














5 jets or more 


70.3xl0 3 








14643 xlO 3 


96.7xlO a 


0.005 


m B cut [Eq-Q] 


36.1xl0 3 








4768 xlO 3 


34.4xl0 3 


0.008 


{t w ,t w ), 2b-tag 


1811 


74.5% 


6.9% 


6.0% 


63.2 


0.58 


28.65 


(W-), 26-tag 


1513 


26.8% 


24.6% 


7.3% 


466 


3.77 


3.25 


(t-,l w ), 2fo-tag 


1066 


26.5% 


8.3% 


21.1% 


362 


2.78 


2.95 


(t_,t_), 2fe-tag 


1615 


9.9% 


13.3% 


14.7% 


1348 


8.55 


1.20 



TABLE II: Numbers of events for simple buckets with one 6-tag on each bucket, passing various levels of top reconstruc- 
tion, as described in the text. Events that do not reconstruct tops to within R\,R 2 < 0.5 are not shown, but consist of 
the remaining percentage of events for each category. 



For (t w ,i w ) events where each bucket contains exactly one 6-jet, the top momentum reconstruction is generally 
good. As seen in to Table [ill about 75% of the events reconstruct both top directions well. As expected from the 
discussion above, a significant fraction of signal events only give (3j,2j)-buckets. When a W candidate is not 
found, but each bucket contains a 6-tag and lies in the top mass window Eq. Q , the momentum reconstruction 
is good only for the lw bucket; in these events half of the i w momenta with a W candidate reconstruct the 
top direction well. All this points to using the 6-tag information to improve our reconstruction algorithm. This 
will be the starting point of the improved algorithm in the next section. 



III. BOTTOM-CENTERED BUCKETS 



In Section |TT] we have seen that we need at least two 6-tags per event to control the QCD background. 
However, in the simple algorithm, each bucket does not always have exactly one 6-jet and the reconstruction 
is not particularly effective for (t^t^) events. The obvious solution is to define buckets around 6-tagged jets, 
i.e. starting each bucket with the bottom jets (which are usually the hardest jets in the event) and adding 
light-flavor jets to it. 

In this section we define buckets starting with the requirement that B\ and B 2 each have exactly one 6-jet, 
and restrict the possible permutation of jet assignments to B\, B2, and -Bisr accordingly. Other than this, we 
use the same distance measure denned in Eq. ^ and Eq. ^ and select {B 1: B 2 , Sisr} giving the minimum A. 

Figure [2] shows the bucket masses TObu wb 2 and TOisr- For both ti and QCD samples the m^, distributions 
peak at nit by construction. The distribution is narrower for the signal. The dip in the ms 2 distributions at 
m t is due to the large weighting factor u> in Eq. ([3]), which defines the bucket with mass closest to to* to be 
B\. Compared with the distributions, the distributions are broad but still tend to peak toward 771.4. 

As mentioned above, the analysis of the top buckets constructed around the 6-tagged jets is the same as the 




FIG. 2: Bucket masses for tt (left) and QCD (right) events. We select all events with rij > 5, skipping the mass window 
for the top buckets. 
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ihih+jcts [fb] 


Ri,R 2 < 0.5 


7?i < 0.5 


R 2 < 0.5 


QCD [fb] W+jets [fb] 


S/Bqcu 


5 jets, 26-tag 


21590 








16072 


109.6 


1.36 


(tijj fi-w ) 


2750 


68.9% 


9.3% 


7.5% 


126.2 


1.181 


21.8 


(W-) 


2517 


23.4% 


25.6% 


8.5% 


727.1 


6.03 


3.5 


(t-,U) 


1782 


21.8% 


9.1% 


22.6% 


596.5 


4.85 


3.0 


(t_,t_) 


2767 


9.0% 


14.3% 


13.9% 


2002 


14.05 


1.4 



TABLE III: Signal and background rates passing various levels of reconstruction, requiring one 6-jet in each top buckets 
B\ t 2- The base-line selection cuts are the same as in Table [TTl 




150 
m bj [GeV] 




150 
ra bj [GeV] 



FIG. 3: Invariant mass distributions of the b quark and the harder W decay jet. Left: parton level with (solid) and 



without (dotted) the requirement Pr,j2 < 25 GeV < Pr,b>PT,ji- 
show thth+jets, blue lines QCD jets events. 



Right: mtj distributions for t_ buckets. Black lines 



simple algorithm described in Section [TTJ including the bucket mass cut in in Eq. @. In Table [ill] we show the 



corresponding results. Starting with two 6-jets improves the number of (i w ,t w ) events by almost 50%. Roughly 
70% of (t w , tu,)-events reconstruct both tops well, essentially unchanged from the earlier analysis. One kind of 
events which is now correctly accounted for are cases where the simple algorithm finds two 6-jcts in the same 
bucket, and give a bucket mass in the correct range. 

Asking for two 6-tags within at least five jets at the very beginning produces large combinatorial factors for 
mis-tagging QCD background events. As a result the backgrounds double in each category and S/B degrades 
for (twjtu,) events. 



While there is no obvious way to improve the (t w ,t w ) category of events, Table III shows that a significant 
number of events come out (t lu ,t_) and (t_,t«j), that is, only one bucket contains a W candidate. For these 
events, the QCD background is not huge, S/B ~ 3, so we will try to improve our treatment of this fraction of 
events. 



b/jet Buckets 

In Section [n] we found that it is not rare for the softest top decay jet to fall below the jet px threshold. 
Attempts to reconstruct two tops in (3j,3j)-buckets will then fail. In 94% of these cases the softest of the six 
top decay partons comes from the W decay. Restricted to events where the sixth parton falls below 25 GeV 
this fraction increases to 98.5%, i.e. whenever the sixth parton is missing the surviving two jets are the bottom 
and the harder W decay jet. In Figure [3] we first show the invariant mass of the b and the harder W decay 
product TOjyj at parton level. We see a clear peak and an endpoint mbj 1 < yj ml — rrv^ ~ 155 GeV |19j . For 
events where the softer W decay jet falls below the px threshold the peak becomes more pronounced. 

The question is: can we use the predicted peak in the rribj distribution to identify tops in 2j-buckets? If the 
third missing top decay jet indeed fails the px threshold we expect the top momentum to be close to the 6/jet 



7 




Ap T /p T Ap T /p T A Pt / Pt 

FIG. 4: Upper Row: AR between top parton and reconstructed top from parton level simulation, t w buckets, and t_ 
buckets (left to right). The dotted line at parton level shows all events, the solid lines only events with pr < 25 GeV for 
the 3 rd top decay product. For reconstructed tops, buckets with a single 6-jet in Bi and B2 without Ag minimization 
are are shown in black, t_ buckets after Ag minimization in red. Lower row: Apt/pt c distributions for the same set 
of events. 



momentum. The left panels of Figure [4] show the difference between the parton-level top momentum and the 
6/jet system in terms of (pr,bj ~ PT,t)/PT,bj = ^PT/PT,bj and AR. If we assume an R separation of around 
0.5 as a quality measure the result looks promising. Similarly, we should be able to reconstruct the transverse 
momentum of the top at least at the 20% level without including the softest W decay jet. In comparing our 
bucket method to top taggers, it should be emphasized that for the bucket method we allow for a missing W 
decay jet rather than replacing the lighter W decay jet by a QCD jet [T2"] . 

For 2j-buckets, which we know do not include all three top decay products, we replace the distance measure 
of Eq. ([2]) with a similar measure inspired by the distribution of my , 

A 6j _ f \m B - 145 GeV| if m B < 155 GeV , . 

B I 00 else ^ ' 

The peak value of 145 GeV is read off Figure [3] and should eventually be tuned to data. 

Because 3j-buckets already reconstruct the top momentum we keep them. For top buckets in the (tuj,t_), 
(t_,tu,), and (t_,t_) categories which do not contain a W candidate we re-assign jets replacing Eq. ([2| with the 
new distance measure. In addition, we need to remove the top mass selection cut Eq. Q. This way combinations 
of b quarks and jets which do not fall into the window of Eq. ^ are kept. The new reconstruction algorithm 
reads 

• (iw,Uv)'- keep these buckets as is, 

• (ta;,t_) or (t_,tu,): reconstruct the failed bucket using all non-t m jets, minimizing A^ , 
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Wh+jets [fb] 


Ri,R 2 < 0.5 i?i 


< 0.5 R 2 < 0.5 


QCD [fb] W+jcts [fb] 


S/Bqcd 


5 jets, 26-tag 


21590 




16072 


109.6 


1.4 




2750 


68.9% 


9.3% 


7.5% 


126.2 


1.181 


21.8 




7465 


49.0% 


17.8% 


10.3% 


2145 


15.78 


3.5 


(t-,U) 


997 


29.5% 


19.7% 


16.9% 


160.2 


1.42 


6.2 


(t_,t_) 


3979 


38.7% 


17.0% 


15.1% 


2575 


17.49 


1.6 



TABLE IV: Number of events reconstructed using the 6/jet-buckets for (t u 
for (t w ,t w ) events are unchanged from Table III 



,t_), (t_,t„) and (t_,t_) events. The numbers 



• (t_,t_): use all jets to minimize A^ 
Note that for reconstructing &/jet-buckets we use jets both from the i_ bucket and from the ISR bucket. 



Comparing to the original algorithm we have adapted the metric for assigning jet for top buckets in the t_ 
category. What remains is to replace the top mass window in Eq. Q with appropriate 6/jet values. In the 
right panel of Figure [3] we show the 6 /jet bucket mass distributions my for signal and background. For the 
signal they agree well with the expectation from the left panel of Figure [3j For a top candidate we require at 
least one 6/jet pair satisfying 

75 GeV < m bj < 155 GeV . (8) 

We show the signal and background efficiencies of this new reconstruction algorithm in Table jlV] along with the 
percentage of correct top reconstruction. The numbers need to be compared to Table [TTT] First, we see that the 
number of events which contain valid top buckets in the correct mass window, albeit including one 2j-bucket, 
has significantly increased. In the (tm ,t_ ) category roughly half of all events reconstruct both tops well, in spite 
of missing one of the six decay jets. The number of (t-,t w ) events passing this reconstruction algorithm drops 
significantly when compared to Table |III| Most of these events contain one 6-jet and one non-6-tagged jet in 
B\. However, the 6-jet in this category of events is typically a merger of a b and the third jet from the top 
decay. Thus, while the bucket itself has an invariant mass near the top, it contains neither a W candidate nor 
a fr-jet that can be combined with another jet in the event to pass the selection criteria in Eq. ([8|. Even in 
the (t_,t_) category where neither of the two buckets include a reconstructed W candidate the fraction of well 
reconstructed top pairs reaches almost 40%. 



To study the quality of the top reconstruction in more detail we show the difference between the bucket 
momentum and the parton level top momentum in terms of Ai? and Apt/pt in the right two panels of 
Figure [4] The buckets constructed around 6-jets are shown in black. The results of replacing the t_ buckets 
using the 6/jet algorithm are shown in red. In this case we see a narrow peak at zero which corresponds to 
complete top momentum reconstruction where we fail to find a W candidate due to overlapping jets. Such 
events - which are in the minority - often fail to pass the reconstruction using the Ag metric. As a result, the 
narrow peak at zero is not present in this second reconstruction method. 

For t_ buckets the fr/jet algorithm consistently reconstructs the top direction significantly better than using 
the original method. In contrast, changing t^buckets to the 6/jet-bucket does not improve the momentum 
reconstruction. We checked 6/jet-momentum provides better top momentum reconstruction than only using 
the bottom momentum. 



Pt dependent efficiencies 



Until now we have focused on identifying and reconstructing pairs of hadronically decaying top quarks from 
the complete signal sample. The results shown in Table |TV| indicate that the efficiency as well as the background 
rejection of our algorithm allows for a systematic study of hadronic top pairs. However, the fraction of events 
with not-quite-perfect reconstruction of the top directions > 0.5 for i = 1, 2), is somewhat worrisome. From 
top tagging we know that a certain fraction of relatively poorly reconstructed tops cannot be avoided [T2] , but 
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p [GeV] p [GeV] p [GeV] 

r T,bj *T,bj r T,bj 



FIG. 5: Correlation between pr and the AR^j (for (2j)-buckets) or ARtjj (for (3j)-buckets) for reconstructed hadronic 
tops. From left to right: parton level simulation, buckets correctly reconstructing the top direction (Ri < 0.5), buckets 
not reconstructing the top direction (Ri > 0.5). The upper row shows t w buckets, the lower row t_ buckets. 



that fraction should be small. What we need is a self-consistency requirement — or QMM 1 — similar to only 
accepting reconstructed tops with pr,t > 200 GeV in a top tagger [TU] . 

Once we identify a top buckets we can use two observables to define such a QMM: the top momentum and 
the geometric size of the hadronic top decay. The latter is defined differently for t™ buckets and t_ buckets. 
In the first case we have access to all pair-wise AR distances between the three top decay products. We define 
Rbjj as the maximum of the three AR separations of the top decay products. For t_ buckets we only have one 
distance, namely Rbj between the bottom and the hardest light-flavor jet. 

In Figure [5] we show the correlation between these two observables, first for parton level simulations in the 
left column. For both kinds of buckets we see a clear correlation, with the main difference being that most t„, 
buckets have relatively low transverse momenta. For t_ buckets, which require the softest top decay jet to fall 
below px.j = 25 GeV, the distribution extends to larger transverse momenta where the initial boost of the top 
can compensate or the decay momentum of the softest jet. 

The second column shows the reconstructed observables for t^, and t_ buckets, requiring that the buckets 
reconstruct the parton level top direction within R < 0.5. The correlation between size and transverse momen- 
tum is the same as expected from simulation. However, we clearly see that either large transverse momenta, 
Pr.t ^ 100 GeV, or small sizes, A-R^Q) < 2.5, are preferred. This is particularly true for t_ buckets. The 
reason for this is that a slight boost of the top quarks generates a geometric separation of the transverse back- 
to-back tops and the forward ISR jets. Combinations of jets from different buckets are now separated in their 
typical transverse mass values. This gives us a handle on combinatorics and improves the top reconstruction 
even in the case where one of the top decay products is missing. 



We use this opportunity to, for the first time, introduce 'quality management milestones' (QMM) in a research paper. 
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FIG. 6: Left: efficiency for a single bucket tag as a function of the true transverse momentum of the top, shown for tu, 
buckets (black) and t_ buckets (red). Dashed lines indicate events matched to a top quark within Ri < 0.5. Center: 
efficiencies for two bucket tags as a function of the average true pr,t- We show (t w ,t w ) events in black, (t„,t_)/(t_,t w ) 
in red, and (t_,t_) event in green. Dashed lines again indicate reconstructed tops matched to parton level tops. Right: 
corresponding efficiency for two HEPTopTagger [9][10] tags. In all cases the last bin includes all events above 450 GeV. 



Conversely, buckets passing as tops but giving a poor directional reconstruction reside at low transverse 
momenta and large size, as can be seen in the third column of Figure [5j To veto these buckets we have a choice 
of criteria in the two-dimensional Rbj(j) vs. px,t plane. We choose the condition 

> 100 GeV (9) 

at the level of the buckets to increase the fraction of well reconstructed or matched top quarks in both bucket 
categories. This choice results in the highest efficiency of well-reconstructed tops in both tu, and t_ buckets. 
Alternative conditions in terms of Rbj(j) or m the two-dimensional planes shown in Figure [5] could replace 
Eq. (|9| in specific analyses. For example, a stricter cut will result in higher purity of well-reconstructed tops. 

To illustrate the power of the bucket algorithm we compute the efficiency for reconstructing a single top as 
well as a top pair as a function of the transverse momenta of the tops. The left panel of Figure [6] shows the 
efficiency for a bucket tag as a function of the true transverse momentum of the top. The baseline is all fully 
hadronic tt events in the Standard Model, with five or more jets and two 6-tags. A possible mis-measurement of 
PT,t in particular at low transverse momenta explains the tail of events below the apparent consistency criterion 
Pt 1 > 100 GeV. We see that the tagging efficiency increases rapidly right at threshold. Above px.t — 150 GeV 
more than 90% of the tagged top quarks can be matched to a true top within Ri < 0.5. For p^ t = 100—150 GeV 
about 80% can be so matched. For t^, and t_ buckets the number of unmatched tops becomes negligible above 
250 GeV. Adding t„, and t_ buckets, the total efficiency of our algorithm is 60-70% for 150 < px,t < 350 GeV. 

In the central panel of Figure [6] we show the tagging efficiency for two top quarks as a function of the average 
true transverse momentum px = {pT,ti + J>t,*2)/2. The total efficiency is split between (t w ,t w ) events (black), 
(t„,,t-) or (t-,tu)) events (red), and (t-,t_) events (green). For each of these categories we also show the well 
reconstructed tops only. As expected, the (t w ,iw) events are reconstructed with an encouragingly high efficiency 
and essentially negligible number of non-matched tops. For the other two categories the fraction of unmatched 
tops is slightly larger, but well under control. 

Also note that the efficiency for (tu,,tu,) events is slightly higher than the square of the single bucket tu, 
efficiency. This is because, once one top in an event is reconstructed, the second top becomes easier to find, due 
to combinatorial factors. Similar correlations occur in the (t„,,t_), (t_,tu,) and (t_,t_) categories. The total 
double top tag efficiency for pr,t = 150 — 350 GeV is close to the single tag efficiency: 55-70%. As we always 
search for two tops (otherwise we regard the event as un-reconstructed) , the total double tag efficiency and 
total bucket tag efficiency must be closely related, as long as the individual pr.t and averaged pr,t distributions 
are similar. We should note that some of the unmatched tops may still be correct tags as QCD effects will 
change the direction of the true top as compared to the top decay products at parton and particle level. 

The resulting cross sections of reconstructed tops with the consistency selection cut Eq. Q are summarized 
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thth+jcts [fb] 


Ri,R 2 < 0.5 


Ri < 0.5 


R 2 < 0.5 


QCD [fb] W+jet 


s [fb] 


S/Bqcd 


5 jets, 2b-tag 


21590 








16072 


109.6 


1.36 


(iw,tw), Pt c > 100 GeV 


1417 


86.4% 


5.4% 


4.1% 


27.1 


0.34 


52.3 


(U,,t-), pT > 100 GeV 


2875 


80.1% 


7.4% 


6.2% 


308.3 


3.44 


9.3 


(t_,U), p? c > 100 GeV 


309.1 


60.2% 


15.4% 


13.3% 


26.6 


0.33 


11.6 


(t_,t-), pT > 100 GeV 


1507 


68.5% 


11.1% 


11.8% 


417.2 


4.69 


3.6 


total, p5? c > 100 GeV 


6109 


77.7% 


8.2% 


7.5% 


779.2 


8.81 


7.8 



TABLE V: Number of events reconstructed using the b/jet-buckets for (t„,t_), (t_,t m ) and (t_,t_) events with > 
100 GeV cut. 




FIG. 7: rriT2 distributions for stop and top pair production using all (t w ,tw), (tu,,t_), (t-,i w ) and (t_,t_) events from 
the 6/jet-bucket algorithm, requiring p5\t > 100 GeV. The distributions for stop pair production with stop masses of 
400, 500 and 600 GeV are shown after a selection cut on j£ T > 150 GeV, along with the ti background. The vertical 
dotted line indicates m,T2 = 350 GeV. 



in Table |V} The total double top tag efficiency for the i^i^-l-jets sample with five jets of which two are fr-tagged 
is 28%. The mis-tagging efficiency of finding two valid top buckets in the pure QCD events (five jet, two 
mis-tagged as 6-jets) is of the order of 5%. 

Unlike for a typical top tagger, illustrated in the right panel, the efficiency of the buckets does not reach 
a plateau at large transverse momentum. Once the top decay jets start merging at the scale of the C/A jet 
size the method will fail, so for example Rg/A — 0-5 leads to a drop above pr,t ^ m t /R ~ 350 GeV. Towards 
smaller top momenta the requirement Eq. Q limits the efficiency by removing poorly reconstructed tops due to 
combinatorics. By construction, the bucket method targets the intermediate regime 150 GeV < pr.t < 350 GeV 
where it should serve as a very useful tool in Higgs searches as well as new physics searches. 



IV. STOPS FROM BUCKETS 



As a demonstration of our algorithm for top reconstruction, we apply it to scalar top searches. Searches 
for supersymmetry, or general, top partners are becoming more and more central in ATLAS and CMS. They 
constrain the allowed stop masses to > 600 GeV [20]. Theoretically, many analysis strategies have been 
suggested, covering the semileptonic decay channel [3T], the hadronic decay channel [55], or dedicated HEP- 
TopTagger studies in each of these channels [2"3"] . 

In this section, we assume scalar top pair production followed by decay into tops and the lightest neutralino 
Xx with 100% branching ratio. For all model points we set the lightest neutralino mass to m^o = 100 GeV. 



Cross sections at the LHC assuming — 8 TeV are shown in Table VI To generate the signal for stop masses 



of 500, 600, and 700 GeV we use Herwig++ [24]. We normalize the production cross section to the Prospino 
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results at next-to-leading order [25] , 

Since the reconstruction technique described in the previous section are also applicable for tops from stop 
decays, we expect good top reconstruction. To reduce the non-top background we first need to apply a set of 
simple selection cuts. We first require at least five jets, two of them 6-tagged. Then, we require larg e missing 
momentum, $ T > 150 GeV, and veto isolated leptons. The results are summarized in Table VI Because 
QCD has no intrinsic source of missing momentum, and W+jets has a small rate and a lepton we ignore 
this backgrounds in this paper, and assume mostly it backgrounds with large missing transverse momentum, 
typically the result of mismcasurement or r decay. 

Based on the algorithm developed in this paper we require two top buckets with 6/jet re-ordering. The two 
reconstructed bucket momenta we denote as pt ± and pt 2 ■ After the missing momentum cut the main background 
is semi-leptonic top pairs, which means one of the two tagged tops in the background sample is mis-tagged. 

The advantage of an analysis based on fully hadronic top decays is that both tops are fully reconstructable [10l 
[TT] , We use the bucket momenta to compute mT2(Pt 1 ,Pt 2 i $t) Its distributions for the it background and 
the stop pair signal is shown in Figure [7j To extract stop pairs we select events with 

to T2 > 350 GeV . (10) 

After this cut and for a stop mass of 600 GeV we arrive at S/ B ~ 1 and more than three sigma significance 
at the 8 TeV LHC with the currently available integrated luminosity of 25 fb -1 . In addition, the endpoint 
of the rriTi distribution with fully reconstructed hadronic tops should allow us to precisely measure the stop 



mass [TO] . All intermediate steps as well as results for other stop masses are shown in Table VI Note that some 
numbers are different from those shown in Table |V| due to the leptonic decays. 

Of all events with two reconstructed tops about 10% involve r leptons, both for the signal and the background. 
After the missing momentum cut a significant fraction (~ 75%) of the top background comes from these events. 
In contrast, only 10% of the signal events include a top decay to a r. Therefore, a r-rejection would improve 



our results significantly, as shown in Table VI 



V. CONCLUSION 



In this paper we have presented a new method to identify and reconstruct hadronically decaying top quarks. 
It is based on assigning regular jets to buckets, one for each top decay and one for initial state radiation. The 
buckets corresponding to tops are each seeded with one of the two &-jets we require in every events. If a top 
bucket includes all three top decay products it has to fulfill W and top mass constraints. However, frequently 
the softer W decay jet is missing, so we have to rely on the two leading jets to reconstruct a defined fraction 
of the top mass. After an appropriate re-ordering of the buckets missing the softest decay jets both kinds of 
buckets can be used to reconstruct the top four-momentum. 

To suppress tops which for one or another reason cannot be matched to a generated top quark we apply a self 
consistency condition (QMM) to each bucket. This condition defines the lower bound of the typical transverse 



m t - [GeV] 


tt+jets [fb] 


it* [fb] 
500 600 700 


S/B S/VB 
600 


before cuts 
veto lepton 
> 5 jets 
2 6-tags 

2 tops reconstructed, t > 100 GeV 
$ T > 150 GeV 
m T 2 > 350 GeV 


234 x 10 3 
157 x 10 3 
85.9 x 10 3 
28.0 x 10 3 
6.90 x 10 3 
48.53 
0.45 


80.50 23.00 7.19 
50.45 14.38 4.46 
37.87 10.90 3.37 
11.41 3.30 1.02 
4.19 1.30 0.40 
2.98 1.04 0.35 
0.84 0.46 0.19 


0.0002 0.08 
0.02 0.8 
1.0 3.5 


100% t rejection 


0.12 


0.77 0.42 0.17 


3.6 6.1 



TABLE VI: Cross sections for top background and stop pairs with masses of 500, 600, and 700 GeV after selection cuts 
and application of the fo/jet bucket analysis. We assume exclusively stop decays to 100 GeV neutralinos. The significance 
for 600 GeV stops is given for an integrated luminosity of 25 fb -1 . 
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momentum range 100 GeV < pr,t < 350 GeV to which the method is sensitive. For higher boosts the buckets 
will eventually fail due to the size of the jets they are constructed from. For top quarks with this moderate 
boost we achieve a maximum efficiency around 60-70% for the reconstruction of two top quarks. In particular, 
for pr t t < 250 GeV our method gives a significant improvement over subjet-based top taggers, which have low 
efficiencies in this regime. 

To illustrate our approach in a new physics framework we have applied it to supersymmetric stop searches, 
relying on stop decays to tops and missing energy. Because we reconstruct the top four-momenta we can apply 
a simple m,T2 analysis, including a measurement of the stop mass. This makes stop search strategies as simple 
as sbottom or slepton searches. 

While the detailed numerical results for our method should be tested in a realistic experimental environment 
there obviously exists a wide range of possible applications for top buckets in ATLAS and CMS. As a first step, 
hadronic top pair production with and without contributions from beyond the Standard Model might serve as 
a useful testing ground [TT] . 
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